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Abstract 

We study the problem of disseminating a piece of information through 
all the nodes of a network, given that it is known originally only to a 
single node. In the absence of any structural knowledge on the network 
other than the nodes' neighborhoods, this problem is traditionally solved 
by flooding all the network's edges. We analyze a recently introduced 
probabilistic algorithm for flooding and give an alternative probabilistic 
heuristic that can lead to some cost-effective improvements, like better 
trade-offs between the message and time complexities involved. We ana- 
lyze the two algorithms both mathematically and by means of simulations, 
always within a random-graph framework and considering relevant node- 
degree distributions. 

Keywords: Random networks. Probabilistic flooding, Heuristic flooding. 



1 Introduction 

A network can be viewed as an undirected graph G = {No, Eg) with n = |A''g'| 
nodes and m = \Eg \ edges, in which the existence of an edge (m, v) represents the 
possibility of bidirectional communication between nodes u and v. We consider 
the problem of disseminating a piece of information, referred to as /, to all the 
nodes of the network, given that initially only one node, called the originator, 
has it. 

'Corresponding author (valmirScos .uf rj .br). 
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A traditional algorithm to disseminate information in networks when nodes 
know their immediate neighborhoods and nothing else is what we call unin- 
formed flooding in allusion to the fact that its actions do in no way take 
into account the structure of the network or any of its properties. In this algo- 
rithm, the originator starts by sending I to its neighbors; when receiving / for 
the first time, each of the other nodes forwards it to its own neighbors. This 
algorithm has message complexity of 0(m), as exactly two messages are trans- 
mitted on each edge, and worst-case time complexity of 0{n), the latter related 
to the customary message-passing causal chains of distributed computing under 
full asynchronism .3 . Also, when edges can be assumed to have approximately 
equal delays associated with them, the algorithm's average waiting time, which 
is the average time for a node to receive /, is given by the average distance 
from the originator in the network, that is, the average number of edges on the 
shortest paths between the originator and each of the other nodes. 

Our interest in this paper is to study the trade-off between the message 
complexity and the average waiting time when / is no longer forwarded de- 
terministically to all of a node's neighbors, but rather is sent as the result of 
probabilistic decisions. Our initial motivation has been the recent introduction 
of probabilistic flooding j^j, which prescribes for each node that it forward I 
to each of its neighbors with fixed probability p. Probabilistic flooding is then 
also "uninformed," constituting essentially a simple stochastic generalization of 
the aforementioned (deterministic) uninformed flooding. We introduce in Sec- 
tion |21 the alternative that we call heuristic flooding, which is also probabilistic 
in nature but, unlike probabilistic flooding, takes the network's structure into 
account insofar as it can be inferred from the nodes' immediate neighborhoods. 

Naturally, flooding the network with copies of / that are propagated proba- 
bilistically does not ensure that all nodes will eventually receive a copy. There 
do exist applications, however, that may benefit from the aforementioned trade- 
off even in the absence of such delivery guarantees. We refer the reader to [2] 
and to the references therein for examples in the area of peer-to-peer computing 
when / is interpreted as a query and the flooding as a search. 

We base our analyses of both algorithms on the generating functions of 
discrete probability distributions jJOI- The formalism we utilize is the one laid 
down in |15| . whose details are briefly reviewed and extended in Section |3| Our 
analyses are also based on what we call a flooding digraph, which is a random 
directed subgraph of the network. In Section^ we introduce this subgraph and 
use it to obtain our analytical results. 

All our analyses are carried out in the framework of random graphs pQ. 
Within this framework, we give special attention to random graphs whose node 
degrees are either Poisson-distributed or distributed according to a power law. 
The former arise in the classic context of Erdos and Renyi 7 , while the latter 
have recently been found to be representative of networks like the Internet and 
the WWW IHlEin- Simulation results are given in Section [S] for these models 
of random graphs. 

A comparative study of the two probabilistic approaches is given in Section|Hl 
and concluding remarks in Section 13 
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2 Heuristic flooding 



In probabilistic flooding, each node decides whether to send / to each of its 
neighbors based on a fixed probabihty parameter p. As a consequence, a node 
with a smaU degree has a smafler probabihty of receiving the information than 
a node with a large degree. Given a node u and one of its neighbors f , if a is 
the degree of u and b the degree of v, our new heuristic flooding is based on 
a function h{a, b) representing the probability that node u sends / to node v. 
Intuitively, /i(a, b) is expected to have a larger value when at least one of a and 
6 is small than when they are both large, thus attempting to compensate for the 
inherent drawback of probabilistic flooding that we just mentioned, and also to 
reflect the understanding that a small a may signify the existence of insufficient 
alternative routes for v to ultimately receive / from u even when b is large. 

Ideally, an accurate heuristic function would require some knowledge about 
the topology of the network beyond the nodes' immediate neighborhoods; re- 
alistically, however, a node can only be assumed to have information about its 
own neighbors. For this reason, heuristic flooding is defined to operate under 
the assumption by each node u that a neighboring node v can only receive the 
information from min {a, 6} of w's own neighbors, for each one with equal prob- 
ability h{a,b), where a and b are the degrees of u and v, respectively. In other 
words, there are two sides to our assumption. One side is that, if u decides not 
to forward I to v, then v can still receive / from min {a, 6} — 1 of its neighbors, 
if any, in each case with the same probability h{a,b). The other side of the 
assumption is that the other b — min {a, 6} neighbors of v, if any, are unable to 
receive I ii v does not send it to them. 

If we let a stand for the desired probability that a node receives the infor- 
mation being broadcast, then, from the perspective of node u upon deciding 
whether to send the information to its neighbor w, the probability that v does 
not receive the information (that is, I — a) can be expressed as 

l-a^[l-a + a{l- h{a, 6))]"""^'^^''> . (1) 

This expression indicates that v does not receive the information if and only 
if, for each of the min {a, 6} neighbors that the assumption says could send the 
information to it, either that neighbor has not itself received the information 
(with probability 1 — a) or it has but decided not to forward it to v (with 
probability a{l ~ h{a,b))). Hence, 

h{a,b) = ^ 1 . (2) 

a 

This heuristic function has the property that, when both a and b are large, 
/i(a, b) is small, which corresponds to the intuition that there may exist other 
paths along which v can receive /, so it may not be essential that u sends it to 
V. On the other hand, when at least one of a and b is small, then /i(a, b) is large. 
In particular, when at least one of a and b is equal to 1, that is, min {a, 6} = 1, 
then /i(a, b) is also 1. Illustrative plots of ft.(a, b) against min {a, 5} are shown in 
FigureHlfor a = 0.90, 0.95, 0.99. 
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3 The generating-function formalism 

Henceforth, G is viewed as a random graph whose nodes all have degrees that are 
independent of one another and distributed identically to a random variable Kq- 
The nodes of G are assumed to be interconnected at random given their degrees, 
so the degrees of any two adjacent nodes remain independent. The results from 
P3! reviewed in Section ITTl and also their extensions in Section rOl hold in the 
limiting case of a formally infinite number of nodes. 

3.1 Basic results 

Denoting by Pq (a) the probability that a randomly chosen node of G has degree 
a, with a > 0, or equivalently the probability that Kq equals a, the generating 
function for the degree distribution of G is Go{x) such that 

Go(x) = (x^°) = ^x'^Pg(«), (3) 

a=0 

where we employ the usual angle-bracket notation to indicate expectation. 
6*0(1) ~ 1 necessarily, and the average degree of G, denoted by Zg, is the 
expectation of Kq, that is, 

= g;,(i). (4) 

x=l 

More generally, and for s > 1, the sth moment of Kq, (Kq), is 

Tl-l 

a=0 



Zr- 



dx 



Go(x) 



(5) 
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We know from that a criterion exists according to which a random 

graph has almost surely a size-0(n) connected component, referred to as the 
giant connected component (GCC). When this is the case, the other components 
of the graph are small if compared to its number of nodes, that is, the fraction 
of nodes inside each of them is approximately 0. The criterion in the case of G 
is that 

(6) 

If this inequality holds, G is said to be above the phase transition. If it does 
not, then all the components of G are small and a GCC does not exist. In this 
case, G is said to be below the phase transition. 

Let us now consider a node at which we arrive by following a randomly 
chosen edge of G. The probability that such a node has other 6 — 1 edges 
incident to it, with 6 > 1, is the expected fraction of edges incident to degree-6 
nodes, which is given by 

bPaih) ^ bPajb) 

So the number of remaining edges incident to such a node is distributed in a 
way that can be generated by 



n-l 



b—l 

Now let two nodes be called r- neighbors of each other, for r > 1, if the 
distance between them in G is r. The case of r = 1 is simply the case of 
neighbors in G. Given a neighbor of a randomly chosen node u, the expected 
number of that neighbor's other neighbors (i.e., excluding u) is 



b=l ^ 



x = l Zg 



(9) 



(2") 

SO the expected number of 2-neighbors of w, which we denote by Zq \ is given 

by 

ri-l 



a=0 fc=l *^ 



n-l 



- (10) 

b=l ^ 
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(r) 

In general, we denote a node's expected number of r-neighbors by Zq . Clearly, 
— Zq, and can be generalized to yield 

n-l 



bPaib) 



b=l 



(2)- 



Zg 



(11) 



Note that here we have repeatedly taken into account, as we consider nodes' 
neighbors that are progressively farther from the initial, randomly chosen node 
u, that the probability that any two nodes involved in the process are in fact 
the same node is in the limit as n — > oo. 

We can then obtain an approximation for the expected path length of G 

(r) 

by summing the values of Zq from r = 1 up until the sum becomes equal to 
n — 1. When this happens, the current value of r can be taken as Lq, the desired 
expected path length. Thus, 



n — 1, 



(12) 



which yields 



In 



Za 



Zg 



1 



1 



Lg - — -■ (13) 

We henceforth assume that G is above the phase transition, that is, G almost 
surely has a GCC. Given two adjacent nodes u and v, let the reach of u through 
V be the set of nodes reachable by a path starting at u whose first edge is {u, v). 
A randomly chosen node is outside the GCC if and only if all of its neighbors 
are also outside the GCC, that is, it has a small, size-o(n) reach through each 
of its neighbors. Denoting by q the probability that this happens for each of 
those neighbors, we can express q as 



n-l 

a=l 



Zg 



Gi(q), 



(14) 



that is, a node has a small reach through one of its neighbors if and only if 
that neighbor itself has a small reach through each of its other neighbors. As a 
result, the probability that a randomly chosen node is outside the GCC is 



E9"^'G(a) = Go((7), 



(15) 



a=0 
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and the fraction of nodes inside the GCC of G, denoted by 6q, is given by 

0G-1-Go(g). (16) 



3.2 Extensions within the GCC 

One further assumption that we make in this paper is that the originator, the 
node that initially has the information and starts the dissemination, is inside 
the GCC. In this case our analyses must be based on a degree distribution that 
is conditioned upon this membership in the GCC. The conditional probability 
that a node inside the GCC has degree a, denoted by Poia \ GCC), can be 
written using Bayes' rule as 

^^^(GC^^)^ 

Here Pg(GCC) is the probability that a randomly chosen node is inside the 
GCC, that is, 6g- Pg (GCC | a), in turn, is the probabihty that a degree-a 
node is inside the GCC. Such a node is outside the GCC if and only if it has 
a small reach through each of its neighbors, which occurs with probability q 
for each neighbor. Then a degree-a node is inside the GCC with probability 
Pg(GCC I a) 1 - We then have 

P^(„ I GCC) ^ i^-l'-yoia) ^ ^^g^ 

(JG 

We can now obtain the generating function for the degree of a randomly 
chosen node inside the GCC, which we denote by G'[f*-^'-'(x), as 

a=0 ^ 

Gojx) -Gojqx) 

6g ■ ^ ' 

Also, following our earlier steps, this can be used to calculate the expected 
number of neighbors and 2-neighbors of a randomly chosen node inside the 

(2) 

GCC, which we refer to as Zqcc and ^gcc, respectively. We obtain 



^GCC 

and 



d 



G^^^ix) 

1 



Zo-lG'oi'i) (20) 
do 



4^<^c=^GCcE(«-l)^- (21) 

a—1 

We can also retrace the steps that led us to (|13f) . and obtain LqcCi the expected 
path length inside the GCC: 



In 

Lgcc — — 



Zgcc J \ Zgcc 



^^(^■^Cc/^GCC) 



(22) 
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4 Mathematical analysis 



We start by considering a generic probabilistic algorithm for disseminating a 
piece of information / on the network represented by G. In this algorithm, 
when a node of degree a receives / for the first time, it forwards / to each of 
its neighbors with probability /(a, 6), where b is the degree of the prospective 
recipient. This process induces the appearance of a random directed subgraph 
F = {NpjEp) of G that we call the flooding digraph. This digraph has the 
same nodes as G (that is, Vp = Vq) and its edges are such that, for (u, v) £ Eq, 
the directed edge (m — > v) exists in Ep with probability f{a,b), given that a 
and b are the degrees of nodes u and v in G, respectively. 

Clearly, this generic algorithm can stand for both probabilistic flooding and 
heuristic flooding. In the former case, /(a, 6) — p regardless of a or 6; in the 
latter, f{a,b) is the heuristic function h{a,b) given by |(5J. As probabilistic 
flooding is a special case of heuristic flooding, for the remainder of this section 
we concentrate solely on the latter and use the h{a, b) of Q instead of /(a, b). 

4.1 Random digraphs 

Let us first review some of the properties of random digraphs in general and 
also how they apply to the case of F. 

For u a node of a digraph, its in-neighbors are those nodes from which an 
edge exists directed toward u; its out-neighbors are those nodes toward which 
an edge exists directed from u. Moreover, a node is an r-in-neighbor of u if the 
directed distance from it to u is r. Similarly, a node is an r-out-neighbor of u if 
the directed distance from u to it is r. When a directed path exists from node 
u to node v, we say that v is reachable from u (equivalently, u reaches v). 

A connected component of the undirected graph that underlies a digraph 
(i.e., the graph that we obtain when edge directions are ignored) is called a 
weakly connected component of the digraph. Thus a digraph has a size-0(n) 
weakly connected component, referred to as the giant weakly connected com- 
ponent (GWCC), if and only if the undirected graph that underlies it has a 
GCC. 

The GWCC of a digraph has four distinguished types of sub-digraphs. In 
the case of F, they are as illustrated in Figure[21and such that: 

• The giant strongly connected component (GSCC) is the largest sub-digraph 
of the GWCC that is maximal with respect to the property that any of 
its nodes is reachable from any other. 

• The giant in-component (GIN), comprising a fraction 9™ of the nodes of G 
(or a fraction 9'p /Og of the nodes of the GCC of G), contains all the nodes 
that can reach the GSCC (including, by definition, those of the GSCC). 

• The giant out-component (GOUT), comprising a fraction 6*™* of the nodes 
of G (or a fraction 9°p^ /Oq of the nodes of the GCC of G), contains all the 
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Figure 2: The structure of the GWCC of F and its relation to G and to the 
GCC of G. 



nodes that are reachable from the GSCC (including, by definition, those 
of the GSCC). 

Each of the so-called tendrils consists of some of the remaining nodes [H] ■ 



4.2 Number of nodes reached 

We now consider the expected fraction of nodes of G that receive / in heuristic 
flooding. Recall that, by assumption, G almost surely has a GCC and the orig- 
inator is one of the nodes of the GCC. Two cases must be considered. The first 
case corresponds to the originator being outside the GIN, belonging therefore 
either to the portion of the GOUT that does not intersect the GSCC or to a 
tendril. This case happens with probability 1 — 9™/9q and leads to a negligibly 
small number of nodes reached by the flooding. The second case is the case in 
which the originator is inside the GIN. It occurs with probability 9™ /9q and 
the flooding necessarily reaches all the nodes of the GOUT. Nodes outside the 
GOUT may also be reached but contribute negligibly, since they belong either 
to the portion of the GIN that does not intersect the GSCC or to a tendril. So, 
in essence, the probability that a node is reached, denoted by P„, is 

P^=-4^- (23) 

In order to calculate 9^p and 9°p^, let us flrst define the ancestry of a node as 
the set of nodes from which it can be reached in F, and the descent of a node 
as the set of nodes reachable from it in F. Then a randomly chosen node is 
inside the GIN if and only if it has a large, size-8(n) descent. Now consider a 
randomly chosen node u having degree a in G. We say that a neighbor w of m 
in G is a dead-end with respect to u if either (u v) is not an edge of F or it 
is but V has a small descent. If h is the degree of v in G, we denote by gJJ"* the 
conditional probability that v has a small descent, given that it has a directed 
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edge in F incoming from u. The probability that a degree-& node is a dead-end 
with respect to a degree-a neighbor in G, which we denote by w"'^^, is then 



l-h{a,b) + h{a,b)qr'. (24) 



But the probabihty that a node's neighbor in G has degree b is given by (TJ, so 
the probabihty that a given neighbor of a degree-a node is a dead-end is 

b=l ^ 

We also know that such a degree-6 node has a small descent if and only if each 
of its 6 — 1 other neighbors in G is itself a dead-end with respect to it. Thus, 
g™* is such that 

■,cPg{c) 



^ ''-1 



out 
96 



\c=l 



It now sufhces to recognize that a randomly chosen node is outside the GIN 
if and only if all of its neighbors in G are dead-ends with respect to it. The 
fraction of nodes inside the GIN is then given by 



n — 1 / n — 1 



a=Q \b=l 



The equations in (|24|) and (|26|) lead to a nonlinear system of n — 1 equations 
(letting b = 1,. . . ,n - 1 in ((JB))) on n - 1 variables (g°"* through g°"i'i). A 
solution of this system within [0, 1]"^^ can be used, via H24|l . to calculate 9p in 

The calculation of 0™' follows a completely analogous development, since a 
randomly chosen node is inside the GOUT if and only if it has a large, size-9(n) 
ancestry. This leads to 

<, = l-/i(6,a) + ;i(5,a)C, (28) 

where 

cPg{c) 



^n-i ^ ''-1 



96 



\c=l 



is the conditional probability that a degree-6 node having a degree-a neighbor in 
G has a small ancestry, given that in F it has an edge outgoing to that degree-a 
neighbor. The counterpart of (|27|l is then 



E(E<.^) Poia). (30) 
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4.3 Number of messages sent 

Let Pm be the ratio of the expected number of messages sent by heuristic flooding 
to the expected number of messages sent by uninformed flooding. Given that 
the originator of both algorithms is inside the GCC, the expected number of 
messages sent by uninformed flooding is ZcccnOG- Letting be the expected 
out-degree in F of the nodes reached by heuristic flooding, the expected number 
of messages in this case is Z^nOcPn- We then have 

Pm = (31) 

^GCC 

The value of Z^ can be approximated by the average out-degree of the nodes 
inside the GOUT, which we denote by ^qqut- Letting Pp{i \ GOUT) be the 
conditional probability that a randomly chosen node has out-degree i in F, given 
that it is inside the GOUT, Zqq^j, can be expressed as 

n-1 

^GOUT = E*^i^(*|GOUT). (32) 

i=0 

If we let Pp{i I a, GOUT) be the conditional probability that a node has out- 
degree i in F, given that it has degree a in G and that it is inside the GOUT, 
then Pp{i \ GOUT) can be written as 

n-1 

Pp{i I GOUT) = "^Pp{i\ a,GOUT)PG(a I GOUT). (33) 

a—i 

Now recall that every node's degree in G is by assumption independent 
and identically distributed with respect to all others'. Thus, letting be the 
expectation of the heuristic function h{a, b) as b varies, that is, 

/^.- = EM«,^)^, (34) 

the conditional probability that a degree-a node has out-degree i in F, which 
we denote by Pp (i | a) , is given by the binomial distribution: 

pp{^\-) = {^)j{Kr{i-Kr'- (35) 

But given a node inside the GOUT, ^ still gives the probability that one of 
its neighbor's degree is b in G. Therefore, a node with degree a in G that is 
inside the GOUT has out-degree i in F with probability given by H35|l . that is, 
Pp(i\a, GOUT) ^Pp{i\a), which leads to 

n-1 

Pp{i I GOUT) = ^Pp{i \ a)PG{a \ GOUT). (36) 
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We can then re-write as 



^GOUT = J2'Y.[M''ani-Kr'PG{a\G0VT) 

71—1 a y ^ 

= ^Pa(a|G0UT)5]zr (37) 



where the rightmost summation gives the expected out-degree in of a node 
whose degree in G is a, i.e., ah~ . Therefore, 



^GOUT — 

a=0 



J2PGia\G0VT)ah~. (38) 



Now, if we apply Bayes' rule and write Poia \ GOUT) as 

Pg (GOUT I a)PG(a) 



Paia I GOUT) 



Pg (GOUT) 
Pg (GOUT I a)PG{a) 



qout 



(39) 



and furthermore recognize that a node of degree a in G is outside the GOUT 
(with probability 1 — Pg (GOUT | a)) if and only if each of its neighbors in G is 
like the degree-^ nodes at the end of Section IT!^ and also that this occurs for 
each such node with probability 

6=1 



then we obtain 



PG(GOUT|a) = l-(^Xi<,^^j , (41) 



culminating in 



a=0 



,^bPG{b)\ 



qout 



(42) 



4.4 Average waiting time 

As we mentioned in Section ^ assessing a distributed algorithm's time- related 
complexities in a fully general asynchronous setting requires that message- 
passing causal chains be considered In our current context, and taking 
into account our current knowledge of random graphs and their analysis, ob- 
taining accurate analytical estimates of those complexities for flooding seems 



12 



infcasiblc. Our choice is then to settle for a less general form of asynchronism 
in which the delay associated with the delivery of a message over an edge of G 
is roughly constant for all of G"s edges. In this case, the waiting time before a 
node is reached by a flooding can be taken to be proportional to the distance 
in G from the originator to that node. 

In order to compare the expected waiting time of heuristic flooding with that 
of uninformed flooding, we calculate the ratio Pt of the expected path length 
from the originator in F to the expected path length from the originator in G. 
Recall that, even though we employ the common denomination as a path, in 
the case of F paths are directed, while in G they are undirected. 

For uninformed flooding with originator inside the GCC, the expected path 
length from the originator can be approximated by Lqqc in In order 

to obtain the corresponding expression for heuristic flooding, we proceed as we 
did for Pn and Pm and consider only the case in which the originator is inside 
the GIN, which occurs with probability Op/Oa- We also assume that only the 
nodes inside the GOUT can be reached by the flooding. Letting ^qqut 
expected path length from the originator under these assumptions, Pt is such 
that 

nin T — 

Pt = '-f^SmZL, (43) 

fa^GGC 

The average out-degree inside the GOUT, ^qqut' given by In order 
to obtain the expected number of 2-out-neighbors of a randomly chosen node 

— (21 

inside the GOUT, denoted by Zqq^j., we first consider two adjacent nodes 
ui and U2 having degrees oi and 02, respectively, in G. If the directed edge 
{ui — > U2) does not exist in F, which occurs with probability 1 — h{ai,a2), 
then ui has no 2-out-neighbors reachable through U2. On the other hand, if 
the directed edge does exist, and this has probability h{ai,a2) of occurring, 
then the number of 2-out-neighbors that ui can reach through U2 is expected 
to be (02 — 1) h~^. So each neighbor of ui in G provides an expected number of 
2-out-neighbors to it that is given by 



n-l 

E 

a2 — 1 



^ {a2-l)K^h{ai,a2) — . (44) 



Now let tai.a2{^) be the function 



taua2{x) ^ x{a2- 1) h h{ai,a2) '^^^^^"'^^ . (45) 



— (2) 

Then ^qqut given by 



n—1 n—1 

ZgoIt = E «i^G(ai I GOUT) ^ t,,^,,(l). (46) 

Qi— a2 — 1 
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Proceeding likewise, we see that, for r > 1, the expected number of r-out- 
neighbors of a node inside the GOUT is given by 



^GOUT = E «i^G(ai I GOUT) 



ai=0 




E^a._,,a.(l) . (47) 



But unhke the Z^^ of ifTT)) or the -^qcq that imphcitly led to (|^ . now the 

— —(2) ~ 

sequence Z^Q^rp, Z^Q^rp, ... is not a geometric progression and -Zjqqut cannot 
be expressed by an equation analogous to H22|l . We could, however, in principle, 
use H47f) to obtain [^goutJ ^^"^ T^goutI extending the sum ^qout + 

— (2) 

■^GOUT + ■ ' ' up until a number greater than n — 1 were obtained. The last 
two values to go into the sum would be ^gout°"^^^ and ^gqut""^^ • But the 

— (r) 

time complexity of calculating -^qqux is 0(n''), so this method is not really 
practical. 

A way out does exist, however, in some special cases. In the case of proba- 
bilistic flooding, for example, we have h(ai,a2) — p for 1 < ai,a2 < n — 1, so 
^01,02(2;) is independent of ai, leading H47|l to be simplified as 



^GOUT=E«i^G(ai|G0UT)^t,,,,,(l) . (48) 



ai—0 \a2 — 1 



In this case, the sequence ^qouT' -^gouT' ■ • ■ is indeed a geometric progression, 
so we can proceed as we did for the undirected case and obtain 



In 



\ ^GOUT / 



-1+1 



GOUT — , .^-(2) 



^^(■^gout/^gout) 



(49) 



The case of a Poisson degree distribution is also amenable to further analy- 
sis, since we can approximate ^qqut ^go\}tP^~^ ^ where p is the expected 
number of out-neighbors that a node chosen by moving along the direction of 
a randomly chosen edge of F has TT. Let us consider such an edge of F. The 
joint probability that the node from which this edge outgoes has degree oi in 
G and that the node to which it incomes has degree a2 in G is denoted by 
i-'G(oi,a2 I e), where e represents the event that the edge exists in F. Using 
Bayes' rule, this probability can be written as 



-Pg (01,02 I e) 



PG(e I ai, 02)^0(01, a2) 
P{e) 

h{ai, a2)aiPG{ai)a2PG{a2) ^ 
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which leads to 



n— 1 n — 1 



h{ai,a2)aiPG{ai)a2PG{a2) 



(51) 



ai — 1 a2 — 1 



and finally to 



in. 



In 



(f^) (P-1) + 1 

V ^GOUT / 



(52) 



GOUT — 



In p 



5 Simulation results 

In our simulations we have used random graphs as models for networks Tj . The 
network may thus not be connected, so simulations have only been carried out 
inside the largest connected component of the random graph. Also, in order to 
ensure that such a component encompasses a large number of nodes, we have 
restricted ourselves to graphs for which a GCC is almost surely guaranteed to 
exist. Our interest has been to analyze P„, Pni, and Pt, and to this end we have 
simulated probabilistic flooding for p — 0.30, 0.60, 0.90 and heuristic flooding 
for a = 0.90, 0.95, 0.99. 

We have considered two random-graph models. In the first model G is 
generated on n nodes by creating an edge between any two distinct nodes with 
fixed probability z/{n — 1) for suitable z. A node in the resulting graph has 
degree a with the Poisson probability of mean z jj, that is, Pg{o) — er^z°- ja\. 
Also, we have Zq = z and (K'q) = + z, which by © implies that we need 
z > 1 for the GCC to almost surely exist. 

Figure |3| shows the results obtained by simulating probabilistic and heuristic 
flooding on random graphs with Poisson-distributed degrees for n = 10000 and z 
varying from 1 to 10. For each value of z, the simulation consisted of generating 
15 random graphs and, for each one, doing 1000 instances of each type of flooding 
(uninformed, probabilistic, and heuristic), each from a randomly chosen node 
inside the GCC. The simulation results are given as averages of Pn, Pm, and Pt 
over the 15000 samples. Notice first of all that agreement between simulation 
and analytical results is very good throughout, with a slight deviation only in 
parts (c) and (f) of the figure. Such deviations are attributed to the various 
approximations of Section ^31 Note also that both probabilistic and heuristic 
flooding do indeed improve on uninformed flooding as far as the number of 
messages used is concerned (Pm < 1), but nearly always at the expense of larger 
waiting times (Pt > 1). 

The results for probabilistic flooding (Figure|3Ia-c)) show that, given a value 
of p, there exists a value of z above which nearly all the nodes of the network 
are reached by the flooding. If we require P„ > 0.99, then the threshold values 
for z are z = 5 for p = 0.90 and z = 9 for p = 0.60. For p = 0.30 the threshold is 
greater than 10. The plots for Pm show that Pm increases with z roughly until 
these thresholds are reached, remaining constant and approximately equal to p 
from there onward. The plots for Pt show a similar but not identical behavior: 
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Figure 3: Simulation of probabilistic (a-c) and heuristic (d~f) flooding on ran- 
dom graphs with Poisson-distributed degrees. The plots show (a and d), Pm 
(b and e), and Pt (c and f) for p = 0.30, 0.60, 0.90 and a = 0.90, 0.95, 0.99. 
Solid lines give the analytical predictions of Sectional 

like Piji, Pt often increases roughly until the thresholds are reached; unlike Pm, 
Pt starts to decrease slightly as z is further increased onward. All the plots 
show that Pn, Pin, and Pt have value when z is near 1. This happens because, 
although the network is above the phase transition that gives rise to the GCC 
of G with high probability for any z > 1, for z sufficiently near 1 the flooding 
digraph F is still below the phase transition at which the GIN and the GOUT 
appear (since P is a directed subgraph of G), so they do not yet exist. 

In the case of heuristic flooding (Figure I^Jd-f)), the results show that Pn 
flattens out much earlier than probabilistic flooding to a value that depends 
on a. This value is about 0.99 for a = 0.99, 0.94 for a = 0.95, and 0.86 for 
a = 0.90. The value of Pm, in turn, increases sharply until approximately 
the value at which the flattening of Pn occurs. Past this value, P^ is seen to 
decrease continually, which indicates the desirable property that, given a fixed 
Pn, a progressively smaller fraction of messages is needed as the average degree 
increases. In contrast, Pt increases onward through the higher values of z. This 
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occurs because heuristic flooding tends to send messages on relatively longer 
paths, avoiding as it does the sending of messages between high-degree nodes. 

The second random-graph model we have considered is the one in which 
degrees are distributed according to a power law. In such a random graph, the 
probability that a node has degree a is Pcio-) = Ca^'^ , where C is a normalizing 
constant and r > is a parameter. Clearly, in the limit as n — > co we have C = 
1/C(t), where C,{x) is the Riemann zeta function JT], that is, C,{x) — Y^^=i V'^ ■ 
We also have Zq = C,{t - 1)/C(t) and {K^) = C{t - 2)/C(t), so the criterion 
of © for the appearance of the GCC can be seen to translate into 

C(t - 2) 

Solving this inequality for r numerically reveals that the GCC almost surely 
exists when r < 3.47. 

Generation of a random graph with degrees thus distributed can be achieved 
in two phases. First, the degrees ai, a2, . . . , a„ of the n nodes, constituting the 
graph's so-called degree sequence, are sampled from the power-law distribution 
and X]"=i ^* labeled balls are put inside an imaginary urn.^ For 1 < i < n, label 
i is given to exactly balls. In the second phase, a pair of balls (labeled, say, 
u and v) is picked from the urn at random and an edge (u, v) is added to the 
graph. This process is repeated until the urn becomes empty. This algorithm 
clearly generates a multigraph (a graph in which multiple edges and self-loops 
are allowed to exist), but the resulting graph is consistent with the model used 
in our analysis, where we assumed edge independence throughout. Furthermore, 
even though there are other algorithms to generate random graphs with a given 
degree sequence ^JE], it is still unclear how to use them within reasonable 
time bounds. The fact that multiple edges or self-loops may now exist in G has 
direct impact on how the various flooding algorithms are simulated. Specifically, 
a node no longer considers whether to forward the information it receives for 
the first time to each of its neighbors, but rather whether to forward it on each 
of the edges that are incident to it in G. 

Figure^lshows the results obtained by simulating the three flooding methods 
on random graphs with power-law-distributed degrees for n — 10000 and r 
varying from 2 to 3. For each value of r, the simulation consists of generating 
300 random graphs and, for each one, proceeding exactly as indicated for the 
Poisson case. First notice, again as in the Poisson case, that agreement between 
simulation and analytical results is very good for and P^. For Pt, however, 
the situation is different. Under a power-law distribution for node degrees, 
expressions like (|21() and (|46|l . respectively for Z^^^ and ^qq^uT' ^'^^ known 
to be problematic 15 , and indeed the calculations do not converge and lead 
to wrong values for Lgcc and ^qout- Even so, it is curious to note that for 
probabilistic flooding the two errors seem to compensate each other somehow 
and the prediction for Pt comes close to the simulation results, as shown in part 

^If X]r=l turns out to be odd, the degree sampling is repeated until an even sum is 
obtained. 
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Figure 4: Simulation of probabilistic (a-c) and heuristic (d-f) flooding on ran- 
dom graphs with powcr-law-distributed degrees. The plots show (a and d), 
Pra (b and e), and Pt (c and f) for p = 0.30, 0.60, 0.90 and a = 0.90, 0.95, 0.99. 
Solid lines give the analytical predictions of Sectional 

(c) of the figure. The case of heuristic flooding, on the other hand, remains 
lacking a satisfactory analytical prediction (thence none is shown in part (f) of 
the figure). 

Also noteworthy in Figure^is the consistent superiority of probabilistic and 
heuristic flooding over uninformed flooding regarding the number of messages 
sent (Pni < 1). In addition, and contrasting with the Poisson case of Figure 13 
the same holds for probabilistic flooding, and for sufficiently large r also for 
heuristic flooding, regarding waiting times (Pt < !)■ The price for this, of 
course, is heavy and reflected on the often poor values of Pn, with the exception 
of probabilistic flooding with p = 0.90 and of heuristic flooding, more or less 
regardless of the value of a, when t is near 2. 

Simulation results for probabilistic flooding (Figure 01 a-c)) show that Pn 
increases when t decreases, but the variation is sometimes not really too pro- 
nounced. For example, for p — 0.90 Pn is about 0.60 when t — 3.0 and 0.87 
when T — 2.0. Similar observations are true of Pm and Pt. Notice, in addition, 
that unlike the Poisson case there does not seem to exist any interval within 
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Figure 5: Comparison of probabilistic flooding and heuristic flooding in the 
Poisson case. Plots are shown for (a), (b), and Pt (c). 

the range considered for r inside which any of the three indicators becomes 
approximately constant. 

In heuristic flooding (Figure 2fd-f)), the value of Pn also increases when r 
decreases, but at a faster pace than in the case of probabilistic flooding. The 
values of Pm and Pt display two different kinds of behavior. While to either 
side of r « 2.4 the value of Pm decreases as r is moved farther away, Pt falls 
relatively sharply to the right of r « 2.4 but rises, albeit slowly and with an 
inversion with respect to the values of a, to the left. From Figure^Jd), r « 2.4 
appears to coincide with a significant change in the second derivative of Pn as 
a function of t. 

6 Probabilistic versus heuristic flooding 

In spite of having already obtained results on both types of flooding, we still 
cannot compare them to each other directly, as the parameter p of probabilistic 
flooding has no relation whatsoever to the parameter a of heuristic flooding. In 
this section, our strategy to make such a comparison possible is to first simulate 
heuristic flooding with a = 0.99 and then use the resulting value of Pn to obtain, 
via (|23|l . the value of p for which probabilistic flooding is expected to reach the 
same fraction of the graph's nodes. 

Our results under this strategy are shown in Figure for the Poisson case. 
Notice, first, that the strategy seems indeed to be effective, since the plots for 
Pn are practically the same for both types of flooding. For z right past 1, all the 
six plots exhibit an anomaly, which is once again likely due to the fact that, at 
these values of z, the phase transition that gives rise to the GIN and the GOUT 
has not yet taken place. 

What is most interesting, though, is the evident trade-off between Pm and Pt 
that can be seen in parts (b) and (c) of the figure. Clearly, under the constraint 
that Pn is roughly the same for both probabilistic and heuristic flooding, for z 
larger than approximately 2.5, Pm is significantly larger for probabilistic flooding 
than it is for heuristic flooding. On the other hand, Pt is signiflcantly larger for 
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heuristic flooding across practically all the spectrum of z values that Figure 
encompasses. 

But notwithstanding this trade-off, Figure|Slalso suggests that the two types 
of flooding become more and more similar to each other as z is increased. In 
fact, as the nodes' degrees become larger and G more homogeneous (approach- 
ing, in the limit, the complete graph on n nodes), the heuristic function of 
(0) approaches a small constant and heuristic flooding progressively becomes 
probabihstic flooding. 

The corresponding results for graphs with degrees distributed according to 
a power law are shown in the plots of Figure IHl Once again, our strategy's 
effectiveness is corroborated by part (a) of the figure. Also, parts (b) and (c) 
reveal the same trade-off between Pm and Pt that we observed in the Poisson 
case, along with a similar tendency, especially in the case of part (b), for the two 
algorithms to resemble each other as r is increased toward 3. The justification 
for part (b) is that, at these values for r, heuristic fiooding has a small value 
for Pn (cf. part (a)), one that can be achieved by probabilistic flooding with 
a value of p for which Pm remains relatively low. As for part (c), increasing 
T toward its higher values makes the occurrence of nodes of very high degree 
ever less likely. Consequently, heuristic fiooding no longer refrains so strongly 
from sending the information being broadcast to the high-degree nodes of the 
network. This makes path lengths tend to lean toward those of probabilistic 
fiooding. 

But what is especially noteworthy in the power-law case is that the supe- 
riority of heuristic fiooding in terms of Pm becomes very pronounced as t is 
decreased from roughly r = 2.5. This happens because, under a power law, 
the graph tends to contain a large set of nodes of small degree, many of them 
of degree 1. In order to reach the same number of nodes as heuristic flooding, 
probabilistic fiooding must run with a value for p that is near 1, which causes 
an unnecessarily high number of messages to be sent. Heuristic flooding, in 
turn, is relatively insensitive to the plethora of low-degree nodes, since it at- 
tempts to provide each and every node with the same probability of receiving 
the information. 

7 Conclusions 

Flooding a network probabilistically for information dissemination is an attempt 
at reducing the heavy communications demand that uninformed fiooding incurs 
in terms of how many messages are needed. The drawback, of course, is that 
the guarantee of network-wide information delivery may be lost. In this paper 
we started with the recently introduced probabilistic flooding, in which every 
node forwards the information to all of its neighbors with constant probability. 

Then we introduced an alternative technique, called heuristic flooding, which 
employs degree-based probabilistic decisions at the nodes, aiming at a pre- 
deflned probability, the same for all nodes, that a node receives the information 
being disseminated. We have also contributed a detailed mathematical analy- 
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Figure 6: Comparison of probabilistic flooding and heuristic flooding for the 
power-law case. Plots are shown for Pn (a), Pm (b), and Pt (c). 



sis of both probabilistic flooding and heuristic flooding, comprising analytical 
predictions of the techniques' reachability, their communications requirements, 
and an indicator related to the time needed for completion of the flooding. 

Our extensive simulation results have revealed the two techniques' main 
characteristics on random graphs with Poisson- and power-law-distributed de- 
grees. They have also indicated an excellent agreement between theory and 
experimentation in most cases. 

In addition, one final set of simulations designed especially to allow a mean- 
ingful comparison between the two flooding techniques demonstrated an in- 
teresting trade-off involving them: in general, heuristic flooding outperforms 
probabilistic flooding in terms of communications requirements, but the oppo- 
site holds in terms of the delay required for completion. Curiously, the balance 
between the two sides of this trade-off is not equally tipped: for example, un- 
der a power law with relatively small parameter, the communications-related 
gain of heuristic flooding over probabilistic flooding significantly surpasses its 
delay-related loss. 
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